Goto

Collaborating Authors

 optimistic tree-search approach


Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Neural Information Processing Systems

We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. The distribution shift is due, in part, to \emph{unobserved} features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, \textsf{Mix\&Match}, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove a novel high probability bound on the final SGD iterate without relying on a global gradient norm bound, and use it to show the advantages of model re-use. Additionally, we provide simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations.



Review for NeurIPS paper: Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Neural Information Processing Systems

The reviewers generally liked this paper and also provided a number of suggestions for improvement. Please take these recommendations seriously when revising the paper. In particular, I agree with Reviewer 4 that the informal theorem statements in the main body obscure many details. Theorem 2, in particular, seems to be simultaneously too formal (do we need all these exact numeric constants?), while also obscuring important details. Overall, the ideas are interesting but I found the paper somehow a bit messy to read.


Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Neural Information Processing Systems

We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. The distribution shift is due, in part, to \emph{unobserved} features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, \textsf{Mix\&Match}, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove a novel high probability bound on the final SGD iterate without relying on a global gradient norm bound, and use it to show the advantages of model re-use.


Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Faw, Matthew, Sen, Rajat, Shanmugam, Karthikeyan, Caramanis, Constantine, Shakkottai, Sanjay

arXiv.org Machine Learning

We consider a co-variate shift problem where one has access to several marginally different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This co-variate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, ${\sf Mix\&Match}$, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.